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A CLASSIFICATION AND RETRIEVAL SYSTEM FOR 
RECORDED FOREIGN LANGUAGE TAPES 

J.B. Kay — A. Jameson 



Le Language Centre de TUniversite d’Essex proposa en 1964 un plan pour les 
collections de sp&imens de langues etrangeres enregistr& sur bande magnetique. 
En 1965 un Foreign Language Recordings Project fut nomm^, confie a un comite et aux 
frais de la Nuffield Foundation. Des mat^riaux existent actuellement pour les langues 
fran^aise, allemande, portugaise, russe et espagnole. 

Le systeme de classement utilise a Essex peut offrir un interlt pour les autres 
institutions qui connaissent les m6mes problemes. Le corpus est d’abord divise 
d’apres la langue, ayant chacune un chiffre, commengant par 001 selon I’ordre 
d’accession. La premiere lettre refer e a la division en langues (F pour le franpais, 
etc.). La deuxieme est celle entre materiaux littdraires et non-litt6raires, la littdrature 
etant a son tour divisee selon les genres: D = theatre, P = prose, etc. Un enregis- 
trement ayant la cote RV 001 est doncidentifie immediatement comme poesie russe, 
SD 007 serait compris comme piece de theatre espagnol, etc. Pour les enregis- 
trements non-littdraires il a fallu jusqu’ici se contenter de divisions relativement 
sommaires: chimie, economie, voyages, guerre, etc., qui permettront des subdivi- 
sions ulterieures rendues necessaires par I’augmentation des materiaux. Voir le 
schema, p. 00. L’Appendice p. 00 donne un aper^u du systeme. Le systeme 
propose n’est pas regarde comme un ideal, seulement comme une tentative de 
maitriser la situation actuelle a Essex. 



Das Language Centre der Universitat Essex machte im Jahr 1964 einen Vorschlag, 
auf Tonband aufgenommene Fremdsprachenproben zu sammeln. 1965 wurde ein 
Foreign Language Recordings Project eingerichtet und die Arbeit, auf Kosten der Nuffield 
Foundation^ einem Komitee iiberlassen. Inzwischen exisderen Materialien fiir fol- 
gende Sprachen: Franzdsisch, Deutsch, Portugiesisch, Russisch und Spanisch. 

Das in Essex benutzte Klassifikationssystem diirfte auch fiir andere Institutionen 
mit denselben Problemen von Interesse sein. Das Corpus ist zunachst nach 
Sprachen eingeteilt, mit Ziffern fur jede Sprache: 001 usw. in der Erwerbsreihen- 
folge. Der erste Buchstabe bezieht sich auf die Sprache (F fiir Franzosisch, usw.), 
der zweite auf die Einteilung in literarische und nicht-literarische Materialien, wobei 
die Literatur wiederum nach ihren Arten unterteilt ist: D = Theater, P = Prosa, 
usw. Eine Aufnahme mit der Nummer RV 001 wird also als russische Poesie 
identifiziert, SD 007 als ein spanisches Theaterstuck, usw. Fiir die nicht-litera- 
rischen Aufnahmen hat man sich bisher mit groben Einteilungen begnugen 
mussen: Chemie, Gkonomie, Reisen, Krieg, usw., die, mit Zunahme des Materials, 
spatere, endgiildge Unterteilungen erlauben werden. Siehe Schema S. 00. Der An- 
hang S. 00 gibt eine Obersicht des Systems. Das vorgeschlagene System wird 
nicht als Ideal angesehen, sondern als ein Versuch die jetzige Situation in Essex 
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Among t*ie post-war developments in language teachif ^ has been the increase 
in the number and variety of technical aids available to teacher and student alike, 
and notably among these is the tape recorder, which has enabled models of 
native speech to be presented much more conveniendy than previously. At the 
same time, fresh thinking on the methods of teaching foreign languages has 
placed much greater emphasis on the spoken language. These two developments 
have together brought home the realisation that there is a great lack of suitable 
practice material, especially of a specialised kind and in the restricted fields which 
are of interest to a large number of adult learners. Obviously, whatever the 
ultimate aims of the student, in the initial stages of learning a foreign language 
there is the need for a simUar linguistic content, but as the learner progresses 
much greatfe* attention must be given to the use he wishes to make of the lan- 
guage he is learning, and courses should be appropriately designed for these 
needs. 

It was as a result of thinking along these lines that a proposal Wi s made in 
1964 by the Language Centre of the University of Essex to the National Commit- 
tee on Research and Development in Modern Languages for a scheme to collect 
samples of foreign language recordings. Recordings of foreign language mate- 
rial had, of course, begun to be acquired as soon as the University began to 
appoint its first language teaching staff, but it was not until September 1965 
that the Foreign Languages Recordings Project became officially designated and 
a small staff was appointed to run it. The project is financed by the Nuffield 
Foundation and has as its brief the coUection of a corpus of recorded foreign 
language material in the disciplines studied at Essex. Material is at present being 
collected in the following languages: French, German, Portuguese, Russian 
and Spanish, and is of interest then both for its content (to satisfy the needs of 
university departments) and for its form (for course material). 

It was clear at the outset that some form of classification system was already 
necessary in order to allow students and staff to find out what was available in 
recorded form, and to permit some system of storage which provided rapid 
access to a given tape, and the need became more urgent as our collection began 

to grow. 

The classification system oudined below has been evolved with our particidar 
needs in view but it may be interest to other institutions faced with similar 
problems. We needed a convenient short coding for identification of tapes and 
a detailed index in some accessible form of what each tape contained. No 
originality is claimed for the coding system in use here which is simply a two- 
letter prefix followed by a three-figure serial number. 

The first cut in the corpus of material is by language, and each language has 
its own series of numbers starting from 001 and deriving from chronological 
order of accession or processing. The first of the reference coding is there- 
fore the initial letter of the language we are dealing with: F for French, G for 
German, etc. Possible later ambiguities may be removed by adding one or more 
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lower case letters (for example, Danish material would be designated Da ^d 
Dutch Du) but at present no such difficulties exist here because of the limited 
range of languages required by our degree course structure. 

The second major cut in the collection is between literary and non-literary 
material. This somewhat arbitrary division was prompted both by greater 
accessibility of the former material compared with the second and by the 
demands of register classification. We have found it convenient to use this divi- 
sion to provide us with two separate card-index systems suitably cross-refer- 
enred. (1) An Index by Authors & (2) An Index by Symbols. The Index by 
Auiiors forms a literary catalogue in which die material is arranged alphabeti- 
cally by the author’s surname in the conventional manner. In addition, in the 
opposite top corner of each index card is the tape reference coding which 
contains the formal category, for literature is sub-divided into formal categories 
which are indicated by the second letter of the reference coding. Again the 
notation is as broad and as simple as possible: D for Drama, P for Pro"i, S for 
Song and V for Verse. The second letter of the reference for literature is there- 
fore mnemonic like the first letter denoting language. Thus a tape coded RV 001 
would be immediatdy identified as a recording of Russian poetry ; similarly 
SD 007 would be a recording of a Spanish play, FS 027 a collection of French 

songs. 

In the literary index, the classification is stricdy formal, so that Eugeni Onegin 
(described by Pushkin as a “novel in verse’’) is RV, whUe dramatised prose 
works, for which the Russians have special weakness, are RD. On the other 
hand some items which consist of passages extracted from standard French 
prose works are labelled ‘FP’ despite the fact that they have been dramatised for 
recording by the use of a number of speakers for the characters and of sound 
effects (e. g. extracts from Hugo’s Les Miserable^). 

Under literature we include recordings of criticism in literary form, reminis- 
cences and memoirs, etc., so long as the material is a reading and is not being 
composed as it is delivered. Some inconsistencies have been unavoidable. For 
example, the Russian comedian Arkadi Raikin has a number of recordings 
consisting of sketches and longer pieces in dramatic form which we classified 
RD and indexed under Raikin, although not written by him. This course has 
been adopted because no-one would think of looking for a Raikin piece under 
the name of the relatively obscure writer of his sketch. 

For purposes of cross-reference, each author has a card on which are entered 
all substantial references to him in the work of other people in any other rword- 
ings we have, and this card is arbitrarily styled an Author card. Similarly, in the 
case of anthologies and collections which cannot be conveniently broken up and 
recorded under particular authors there will be special cards filed under 
Anthology and giving fuU details of the contents, with separate entries on the rele- 
vant author cards. Subject matter of the items in the literary index is not sepa- 
rately entered, a recognition of the primarily formal nature of literature. 
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When we came to consider the non-literary material we were faced by much 
more difficult problems. For consistency, it was again decided to adopt the 
principle that one card should contain the full details of the tape and that there 
should be cross-references in abbreviated form on cards in general categories 
related to subject. However, a way needed to be found of classifying non-literary 
material such that information about both its subject matter and its form was 
provided. The index eventually decided on is divided into a number of categories 
distinguished by letter symbols, together with a number of subject or topic 
cards. The subject cards themselves cover fairly broadly defined fields which will 
allow later sub-division as required, and, of course, th^ may be added to as new 
incoming material may demand. Among the subject headings so far used are the 
following : Chemistry, Course Material, Economics, Historic Occasions, History, 
Life and Work, Linguistics, Medicine, Performing Arts, Philosophy, Physics, 
Politics, Travel, War and Military. A full list is given in the appendix. 

It was in coming to a decision about dividing the material into formal cate- 
gories that we found ourselves attempting something infrequendy done. In the 
end we have analysed our material by reference to two fairly arbitrarily chosen 
parameters, namely that of the number of participants in a speech event, that is 
one or more than one, and that of the degree of preparedness for the event. From these 
two axes it is possible to construct a grid covering most types of spoken (non- 
literary) material. 



Classification of spoken non-library recordings by formal criteria 



Monologue 


1. 

Unprepared 
(novel speech) 

Z 


2. 

Prepared but 

unscripted 

(exposition) 

Y 


3. 

From notes 
lecture) 

X 


4. 

Fully 
prepared 
(address, etc.) 

W 


Polylogjue 


5. 


6. 


7. 


8. 




Unprepared 




One speaker 


Both or all 




(free 




prepared 


speakers 




discussion) 




(interview) 


prepared 










(debate) 




T 




R 


Q 



In practice, cell 1 is quite rare and the result of a somewhat artificial situation, 
e. g. the disembodied voice of the radio commentator. It can also be elicited by 
asking people questions requiring ofif-the-cufif exposition, such as *Can you 
describe how you put on your overcoat.^’ or *Can you explain tome how to 
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make a call from a public telephone box?’ There is certain to be a degree of pre- 
paration even for this type of exercise in so far as the speaker must rapidly 
marshal his thoughts into some kind of logical order. Cell 2 provides in fact for 
the general type of monologue, since most people who embark on monologues 
have a predetermined line of thought to pursue, as, for example, in describing 
a process (exposition), telling a joke or a story, giving a series of political 
\ exhortations, and so on. Cell 3 follows Cell 2 along the preparedness parameter, 

the degree of predetermination being increased by the preset lengd> of the event. 
To this category are assigned, for example, all lectures. Cell 4 is what Aber- 
crombie >) calls ‘spoken prose’ and would include the majority of sound broad- 
casting (e.g. news bulletins), formal set pieces e.g. the Queen’s speech from the 
throne or other things in a similar ‘frozen’ style. 

I Turning to the polylogues. Cell 5 is the typical polylogue type which Aber- 

'! crombie (op. cit.) labels conversation, in which “there is opportunity for give 

I and take”. It includes discussion and conversation of all degrees of formality. 

[ Cell 6 is blank for the reason that a conversation or discussion in which all the 

participants knew what they were going to say would be poindess, although it 
I is conceivable that certain radio broadcasts or political meetings might fall within 

this category. However, in this case there would be no true discussion and the 
speech event would more correcdy be classified as a series of monologues of the 
\ type in Cell 2. Cell 7 covers interviews, interrogations, discussions after seminars 

/ i. e. discussion groups with a group leader to guide their progress, and so on, 

while Cell 8 would include debates (before being thrown open to the house), 
formal disputations, court proceedings, etc. 

Having established these seven formal categories it was then necessary to 
designate a letter code to follow the initial language code letter as in the case 
of the literary material. A mnemonic code, while desirable, proved impracticable; 

> so a series of arbitrary letters was chosen from those not already in use for the 

literary classes, and these are shown in the grid. Thus a tape coded FR 039 
/ would be immediately known to be an interview or discussion in French in 

which one of the participants had prepared his questions in advance. Similarly, 
RX 047 would indicate a tape consisting of a lecture prepared with notes and 
given in Russian. In practice we add the subject classification in abbreviated form 
after tape coding, e. g. RX 047 Hist, (history) or SX 024 Ling, (linguistics). The 
grid thus received the letters: Cell 1 : Z, Cell 2 : Y, Cell 3 : X, Cell 4 : W, 
CeU 5 : T, CeU 6 : -, Cell 7 : R, Cell 8 : Q. 

I The classification of non-literary material was largely devised in this formal 

way to make linguistic analysis easier in the preparation of course materials. It 




1) D. Abercrombie: Conversation and Spoken Prose, the first of four public lectures 
on spoken lang ua g e given at the University of Ghana in Feb. 1959. Published in 
E.L.T.Vol. XVIII No. 1 1963 md tep\xh\isYitd in Studies in Phonetics and Linguistics. 
O.U.P. London 1965. 
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is an open-ended system in so far as there is room for the creation of new cate- 
gories if these are found to be desirable. For instance, we have added a formal 
category ‘N’ which is given to types of material of mixed formal content but 
which are of frequent occurrence, for example, news commentary or documen- 
tary which contains extracts from speeches, commentaries, sound effects, inter- 
views and so on. Similarly, the designation ‘C’ for course material (which is both 
a ‘subject’ and a formal category, as its topic is in most cases its own form) is a 
broad classification which may include pronunciation practice, structure drills, 
dialogues and so on. Course material may be subdivided as required into 
various ‘topics’ which can be shown as a suffix to the tape coding in the same 
way as the other material in the non-literary index. 

A word may not be out of place about the allocation of numbers to the tapes. 
As a matter of principle, numbering systems are less adaptable to mnemonic use, 
although we found we soon had to abandon the attempt to find unambiguous 
letters for the non-literary items. However, the number given to a particular tape 
is purely an access number and is therefore a check of uniqueness. We have 
decided against the allocation of blocks of numbers for particular types of mate- 
rial since we cannot foresee the extent to which various parts of the collection 
will grow. It is of no use to make a numerical principle for the arranging of the 
collection since one is always likely to find subsequent additions requiring the 
creation of multiple blocks and increasing the complexity of storage. To sum 
up, then, each language begins its tape numbering from 001 and continues to 
increase serial numbers indefinitely. Tapes can be shelved first by the language 
designation and thereafter in numerical order, or if desired alphabetically by 
the second letter of the prefix and numerically thereafter. It makes little difference 
in practice since we do not have an open access tape library and tapes are issued 
by a qualified assistant. A student merely needs to find the material he wants in 
the card index either by author, formal category or topic, to note the number 
and to ask the assistant for the appropriate tape. 

From the card index system it is possible to compile registers with brief 
entries of what is available in the collection, and the form the registers take will 
depend on the needs of the departments interested e.g. literary items can be 
simply listed under each language in alphabetical order of author. Items of 
interest to sociologists can be listed under language and/or topic, and so on. 

For administrative convenience we have adopted a colour coding system as 
a check that tapes and records do not become mislaid. Each language uses 
different coloured index cards and corresponding colour flashes on the spines 
of the tape boxes and on the spools themsdves. In addition, the content of the 
tape is briefly noted inside the box and each spool is labelled with the tape 
number and its content; this information is also duplicated on the leader. 
Furthermore, as a safeguard against accidental erasure or other damage, original 
recordings are treated as archive material and are never normally used once they 
have been indexed and processed. These original recordings bear the suffix M 




f 




: 



; 




/ . 



i 






CLASSIFICATION FOR RECORDED FL TAPES 199 

after the tape code reference to indicate an ‘archive’ master, while a second 
master tape, copied from the original, is designated by the suffix M 2 . This 
second master is available for use by the teaching staff and is also the tape from 
which student copies will be made if required. Thus it is a relatively simple 
matter to make a new working copy (Mj) from the archive master should it 
become necessary. The economic disadvantage of having to keep a large and 
steadily increasing number of tapes lying idle is, we feel, justified and to some 
extent offset by the possibly greater cost which might arise in replacing lost 

material, if in fact it could be replaced at all. 

In conclusion, it is not suggested that the tape classification system oudined 
above is ideal ; merely that it appears to meet our particular needs at Essex, that 
it is relatively easy to operate and that it provides ready access both to the tapes 
themselves and to detailed notes of their content. We should like to express our 
thanks for their suggestions and help to our colleagues in the university library, 
especially to Mr. D.B. Butler, but for all inconsistencies and shortcomings we 
ourselves are entirely responsible. 

Prof. J. B. Kay and A. Jameson 
University of Essex 
Colchester /Great Britain 



APPENDIX 

LIST OF TAPE REFERENCE CODES 

(1) First Letter-Language 
F French 

G German 
P Portuguese 
R Russian 
S Spanish 

(2) Second Letter— Formal Category 
(a) Author Index literature) 

D Drama 
P Prose 
S Song 
V Verse 



(b) Symbol Index (Non-literary Material) 

Z Monologue: unprepared (novel speech) 

Y Monologue : prepared but unscripted (exposition, etc.) 



1 




1 
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X Monologue: from notes (e.g. lecture) 

W Monologue: fully prepared (address, etc.) 

T Polylogue: unprepared (free discussion) 

R Polylogue: one speaker prepared (interview, etc.) 

Q Polylogue: both or all prepared (debate, etc.) 

N Mixed formal content (e. g. news commentary/documentary) 
C Course Material 

(3) Number— Order of Access 

Separate series starting widi 001 for each language 



(4) Abbreviated Subject in Tape Tides 
(for non-literary recordings) 



Biol. 


Biology 


Nat. Hist. 


Natural History 


Chem. 


Chemistry 


Perf. Arts 


Performing Arts 


Econ. 


Economics 


Plas. Arts 


Plastic Arts 


Hist. Occ. 


Historic Occasions 


Pha. 


Philosophy 


Hist. 


History 


Phys. 


Physics 


Law 


Law 


Pol. 


Politics 


Ling. 


Linguistics 


Soc. 


Sociology 


Med. 


Medicine 


Trav. 


Travel 


Milit. 


War & Military 







Note: This list is being extended as the collection of recordings grows and is 
subject to revision. 
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